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CONVERGENCE AND PREDICTION OF PRINCIPAL 
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University of North Carolina at Chapel Hill 

A number of settings arise in which it is of interest to predict 
Principal Component (PC) scores for new observations using data 
from an initial sample. In this paper, we demonstrate that naive ap- 
proaches to PC score prediction can be substantially biased toward 
in the analysis of large matrices. This phenomenon is largely related 
to known inconsistency results for sample eigenvalues and eigenvec- 
tors as both dimensions of the matrix increase. For the spiked eigen- 
value model for random matrices, we expand the generality of these 
results, and propose bias-adjusted PC score prediction. In addition, 
we compute the asymptotic correlation coefficient between PC scores 
from sample and population eigenvectors. Simulation and real data 
examples from the genetics literature show the improved bias and 
numerical properties of our estimators. 

1. Introduction. Principal component analysis (PCA) [19] is one of the 
leading statistical tools for analyzing multivariate data. It is especially pop- 
ular in genetics/genomics, medical imaging and chemometrics studies where 
high-dimensional data is common. PCA is typically used as a dimension 
reduction tool. A small number of top ranked principal component (PC) 
scores are computed by projecting data onto spaces spanned by the eigen- 
vectors of sample covariance matrix, and are used to summarize data char- 
acteristics that contribute most to data variation. These PC scores can be 
subsequently used for data exploration and/or model predictions. For ex- 
ample, in genome- wide association studies (GWAS), PC scores are used to 
estimate ancestries of study subjects and as covariates to adjust for popula- 
tion stratification [24, 27]. In gene expression microarray studies, PC scores 
are used as synthetic "eigen-genes" or "meta-genes" intended to represent 
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and discover gene expression patterns that might not be discernible from 
single-gene analysis [30]. 

Although PCA is widely applied in a number of settings, much of our 
theoretical understanding rests on a relatively small body of literature. Gir- 
shick [12] introduced the idea that the eigenvectors of sample covariance ma- 
trix are maximum likelihood estimators. Here, a key concept in a population 
view of PCA is that the data arise as p-variate values from a distinct set 
of n independent samples. Later, the asymptotic distribution of eigenvalues 
and eigenvectors of the sample covariance matrix (i.e., the sample eigenval- 
ues and eigenvectors) were derived for the situation where n goes to infinity 
and p is fixed [2, 13]. With the development of modern high-throughput 
technologies, it is not uncommon to have data where p is comparable in 
size to n, or substantially larger. Under the assumption that p and n grow 
at the same rate, that is p/n — > 7 > 0, there has been considerable effort to 
establish convergence results for sample eigenvalues and eigenvectors (see re- 
view [5] ) . The convergence of the sample eigenvalues and eigenvectors under 
the "spiked population" model proposed by Johnstone [18] has also been 
established [7, 23, 26]. For this model, it is well known that the sample 
eigenvectors are not consistent estimators of the eigenvectors of popula- 
tion covariance (i.e., the population eigenvectors) [17, 23, 26]. Furthermore, 
Paul [26] has derived the degree of discrepancy in terms of the angle be- 
tween the sample and population eigenvectors, under Gaussian assumptions 
for < 7 < 1. More recently, Nadler [23] has extended the same result to the 
more general 7 > using a matrix perturbation approach. 

These results have considerable potential practical utility in understand- 
ing the behavior of PC analysis and prediction in modern datasets, for which 
p may be large. The practical goals of this paper focus primarily on the pre- 
diction of PC scores for samples which were not included in the original PC 
analysis. For example, gene expression data of new breast cancer patients 
may be collected, and we might want to estimate their PC scores in order 
to classify their cancer sub-type. The recalculation of PCs using both new 
and old data might not be practical. For example, if the application of PCs 
from gene expression is used as a diagnostic tool in clinical applications. For 
GWAS analysis, it is known that PC analysis which includes related indi- 
viduals tends to generate spurious PC scores which do not reflect the true 
underlying population substructures. To overcome this problem, it is com- 
mon practice to include only one individual per family /sibship in the initial 
PC analysis. Another example arises in cross-validation for PC regression, 
in which PC scores for the test set might be derived using PCA performed 
on the training set [16]. For all of these applications, the predicted PC scores 
for a new sample are usually estimated in the "naive" fashion, in which the 
data vector of the new sample is multiplied by the sample eigenvectors from 
the original PC analysis. Indeed, there appears to be relatively little recog- 
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nition in the genetics or data mining literature that this approach may lead 
to misleading conclusions. 

For low-dimensional data, where p is fixed as n increases or otherwise 
much smaller than n, the predicted PC scores are nearly unbiased and well- 
behaved. However, for high-dimensional data, particularly with p> n, they 
tend to be biased and shrunken toward 0. The following simple example of a 
stratified population with three strata illustrates the shrinkage phenomenon 
for predicted PC scores. We generated a training data set with n = 100 and 
p = 5000. Among the 100 samples, 50 are from stratum 1, 30 are from stra- 
tum 2 and the rest from stratum 3. For each stratum, we first created a 
p-dimensional mean vector /x fc (k = 1,2,3). Each element of each mean vec- 
tor was created by drawing randomly with replacement from {—0.3,0,0.3}, 
and thereafter considered a fixed property of the stratum. Then for each 
sample from the fcth stratum, its p covariates were simulated from the mul- 
tivariate normal distribution MVN(/x fc ,4I), where I is the p x p identity 
matrix. A test dataset with the same sample size and /x fc vectors was also 
simulated. Figure 1 shows that the predicted PC scores for the test data 
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Fig. 1. Simulation results for p = 5000 and n = (50, 30, 20) . Different symbols represent 
different groups. White background color represents the training set and grey background 
color represents the test set. (a) First 2 PC score plot of all simulated samples, (b) Center 
of each group. 
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are much closer to compared to the scores from the training data. This 
shrinkage phenomenon may create a serious problem if the predicted PC 
scores are used to classify new test samples, perhaps by similarity to pre- 
vious apparent clusters in the original data. In addition, the predicted PC 
scores may produce incorrect results if used for downstream analyses (e.g., 
as covariates in association analyses). 

In this paper, we investigate the degree of shrinkage bias associated with 
the predicted PC scores, and then propose new bias-adjusted PC score es- 
timates. As the shrinkage phenomenon is largely related to the limiting 
behavior of the sample eigenvectors, our first step is to describe the dis- 
crepancy between the sample and population eigenvectors. To achieve this 
purpose, we follow the assumption that p and n both are large and grow at 
the same rate. By applying and extending results from random matrix the- 
ory, we establish the convergence of the sample eigenvalues and eigenvectors 
under the spiked population model. We generalize Theorem 4 of Paul [26], 
which describes the asymptotic angle between sample and population eigen- 
vectors, to non-Gaussian random variables for any 7 > 0. We further derive 
the asymptotic angle between PC scores from sample eigenvectors and pop- 
ulation eigenvectors, and the asymptotic shrinkage factor of the PC score 
predictions. Finally, we construct estimators of the angles and the shrinkage 
factor. The theoretical results are presented in Section 2. 

In Section 3, we report simulations to assess the finite sample accuracy 
of the proposed asymptotic angle and shrinkage factor estimators. We also 
show the potential improvements in prediction accuracy for PC regression by 
using the bias-adjusted PC scores. In Section 4, we apply our PC analysis to 
a real genome-wide association study, which demonstrates that the shrinkage 
phenomenon occurs in real studies and that adjustment is needed. 

2. Method. 

2.1. General setting. Throughout this paper, we use T to denote matrix 
transpose, — > to denote convergence in probability, and ^4' to denote almost 
sure convergence. Let A = diag(Ai, A2, • ■ • , A p ), apxp matrix with Ai > A2 > 
• ■ • > A p , and E = [ei, . . . , e p ], apxp orthogonal matrix. 

Define the pxn data matrix, X as [xi , . . . , x n ] , where Xj is the p-dimension- 
al vector corresponding to the jth sample. For the remainder of the paper, 
we assume the following. 

Assumption 1. X = EA 1,/2 Z, where Z = {zij} is a p x n matrix whose 
elements Zij's are i.i.d. random variables with E(zij) = 0,E(zfj) = 1 and 

00. 

Although the z$,-'s are i.i.d., Assumption 1 allows for very flexible covari- 
ance structures for X, and thus the results of this paper are quite general. 
The population covariance matrix of X is XI = EAE T . The sample covari- 
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ance matrix S equals 

S = XX T /n = EA 1/2 ZZ T A 1/2 E T /n. 

The Afc's are the underlying population eigenvalues. The spiked popula- 
tion model defined in [18] assumes that all the population eigenvalues are 1, 
except the first m eigenvalues. That is, Ai > A2 > • • • > A m > A m +i = • • • = 
A p = 1. The spectral decomposition of the sample covariance matrix is 

S = UDU T , 

where D = diag(di, cfe, • • • , d p ) is a diagonal matrix of the ordered sample 
eigenvalues and U = [ui, ... , u p ] is the corresponding p x p sample eigen- 
vector matrix. Then the PC score matrix is P = [pi, p 2 , . . . , p n ] , where 
= u^X is the uth sample PC score. For a new observation x ncw , its 
predicted PC score is similarly defined as U T x new with the vth (PC) score 
equal to q v = u^x n 



MlCW • 



2.2. Sample eigenvalues and eigenvectors. Under the classical setting of 
fixed p, it is well known that the sample eigenvalues and eigenvectors are 
consistent estimators of the corresponding population eigenvalues and eigen- 
vectors [3]. Under the "large p, large n" framework, however, the consistency 
is not guaranteed. The following two lemmas summarize and extend some 
known convergence results. 

Lemma 1. Let p/n— s- 7 > as n— > 00. 
(i) When 7 = 0, 



(1) d v 
(ii) When 7 > 0, 



X v , for v < m, 
1, for v > to. 



(2) d v 



a.s. ( p(K), forv<k, 
^lU + v^) 2 , forv = k + l, 



where k is the number of X v greater than 1 + ^y, and p{x) = x{l + r y/(x — 1)). 

The result in (ii) is due to Baik and Silverstein [7], while the proof of (i) 
can be found in Section 6.3. The result in (i) shows that when 7 = 0, the 
sample eigenvalues converge to the corresponding population eigenvalues, 
which is consistent with the classical PC result where p is fixed. The result 
in (ii) shows that for any nonzero 7, d v is no longer a consistent estimator 
of X v . However, a consistent estimator of A^ can be constructed from (2). 
Define 



.! = rf+l- 7 + y / (ci + l-7) 2 -4rf 
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Then p^(d v ) is a consistent estimator of when \ v > l + y^y. Furthermore, 
Baik, Ben Arous and Peche [6] have shown the -v/n-consistency of d v to p(A„), 
and Bai and Yao [4] have shown that d v is asymptotically normal. 

Lemma 2. Suppose p/n —> 7 >0 as n—> 00. Let (•,•) be an inner product 
between two vectors. Under the assumption of multiplicity one: 

(i) if < 7 < 1, and the Zij 's follow the standard normal distribution, 

then 



(3) \(e v ,u v )\ 



\ 0, if 1< A„ < 1 + ^7; 

(ii) removing the normal assumption on the Zij 's, the following weaker 
convergence result holds for all 7 > : 

(a\ I/p „ if\ v >l + ^f, 

W l\ e «' u WI^|0, i/KA„<l + ^7. 



Here ftx) = ^(1 - 



The inner product between unit vectors is the cosine angle between these 
two. Thus, Lemma 2 shows the convergence of the angle between popula- 
tion and sample eigenvectors. For (i), Paul [26] proved it for 7 < 1; while 
Nadler [23] obtained the same conclusion for 7 > using the matrix per- 
turbation approach under the Gaussian random noise model. We relax the 
Gaussian assumption on z and prove (ii) for 7 > in Section 6.4. The result 
of (ii) is general enough for the application of PCA to, for example, genome- 
wide association mapping, where each entry of X is a standardized variable 
of SNP genotypes, which are typically coded as {0,1,2}, corresponding to 
discrete genotypes. 

2.3. Sample and predicted PC scores. In this section, we first discuss 
convergence of the sample PC scores, which forms the basis for the inves- 
tigation of the shrinkage phenomenon of the predicted PC scores. For the 
sample PC scores, we have the following theorem. 



Theorem 1. Let = e^X/y / nA^, the normalized vth PC score derived 
from a corresponding population eigenvector, e v , and p v = p v /\/nd v , the 
normalized vth sample PC score. Suppose p/n — s- 7 > as n — > 00. Under 
the multiplicity one assumption, 



rx\ 1/ - vi p ) \ l ~T\ T\2' j/A„>l + v ^y J 

(5) \{&>,Pv)\ ->• < V (At, - ir 

l 0, 1 < At, < 1 + y/j. 

The proof can be found in Section 6.7. In PC analysis, the sample PC 
scores are typically used to estimate certain latent variables (largely the 
PC scores from population eigenvectors) that represent the underlying data 
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structures. The above result allows us to quantify the accuracy of the sample 
PC scores. Note that here {g v ,Pv) 

is the correlation coefficient between g t , 
and p v . Compared to (3) in Lemma 2, the angle between the PC scores is 
smaller than the angle between their corresponding eigenvectors. 

Before we formally derive the asymptotic shrinkage factor for the pre- 
dicted PC scores, we first describe in mathematical terms the shrinkage 
phenomenon that was demonstrated in the Introduction. Note that the first 
population eigenvector ei satisfies 

ei = argmax£'((a r x) 2 ) 

a : a T a=l 

for a random vector x that follows the same distribution of the Xj's. For the 
data matrix X, its first sample eigenvector ui satisfies 



ui = argmaxN(a T Xj) 2 . 



a : a T a=l j = \ 

Assuming that ui and the new sample x new are independent of each other, 
we have 

£((ufx ncw ) 2 ) =E(£ , (ufx ncw Xn Cw uf |ui)) = £(u[E(x ncw x^ w )u[) 

(6) 

= £(uf Suf ) < ef Sei = £((ef x ncw ) 2 ). 
Since the u^x^'s (J = 1,. . . , n) follow the same distribution, 

(7) nE((eJ^) = E (^2(eJ^ < i^ufx,) 2 ) = n£((uf x,) 2 ). 

By (6) and (7), we can show that 

E((uJ x new ) 2 ) < E((eJ x ncw ) 2 ) = E((eJ Xj ) 2 ) < E((uJ Xj ) 2 ), 

which demonstrates the shrinkage feature of the predicted PC scores. The 
amount of the shrinkage, or the asymptotic shrinkage factor, is given by the 
following theorem. 

Theorem 2. Suppose p/n— > 7 > os n-> 00, A„ > 1 + ^7. Under the 
multiplicity one assumption, 



(8) 

VJ 

where p v j is the j th element of p v . 



E(q A„-l 



The proof is given in Section 6.8. We call (A„ — l)/(A„ + 7 — 1), the (asymp- 
totic) shrinkage factor for a new subject. As shown, the shrinkage factor is 
smaller than 1 if 7 > 0. Quite sensibly, it is a decreasing function of 7 and 
an increasing function of A^. The bias of the predicted PC score can be 
potentially large for those high-dimensional data where p is substantially 
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greater than n, and/or for the data with relatively minor underlying struc- 
tures where A„ is small. 

2.4. Rescaling of sample eigenvalues. The previous theorems are based 
on the assumption that all except the top m eigenvalues are equal to 1. Even 
under the spiked eigenvalue model, some rescaling of the sample eigenvalues 
may be necessary with real data. 

For a given data, let its ordered population eigenvalues A* = {CAi, . . . , C-^m> 
C, • • • , C}> where £ 7^ 1, and its corresponding sample eigenvalues D* = , . . . , 
<i*}. We can show that (4), (8) and (5) still hold under such circumstances. 
However, is no longer a consistent estimator of X v , because 

<™'C A «( 1 + a^Ti) =Cp(A„). 

To address this issue, Baik and Silverstein [7] have proposed a simple ap- 
proach to estimate £. In their method, the top significant large sample eigen- 
values are first separated from the other grouped sample eigenvalues. Then £ 
is estimated as the ratio between the average of the grouped sample eigenval- 
ues and the mean determined by the Marcenko-Pastur law [22] . To separate 
the eigenvalues, they have suggested to use a screeplot of the percent vari- 
ance versus component number. However, for real data, we may not be able 
to clearly separate the sample eigenvalues in such a manner and readily 
apply the approach. Thus, we need an automated method which does not 
require a clear separation of the sample eigenvalues. 

The expectation of the sum of the sample eigenvalues when £ = 1 is 

(p \ p 
^2 d v = £(trace(S)) = trace(£(S)) = trace(S) = ^ A„. 
v=l J v=l 

Thus, the sum of the rescaled eigenvalues is expected to be close to (X^^Li + 
p — m). Let r v = d%,/(YX=i C C) ana - °^ De a properly rescaled eigenvalue, then 
d v should be very close to r v (Y^=x X v +p — m). Note that pj (YlvLi ^v+P~ 
m) — > 1 for fixed m and A„. Thus, pr v is a properly adjusted eigenvalue. 
However, for finite n and p, the difference between p and (Yl^Li ^v+P~ m) 
can be substantial, especially when the first several A„'s are considerably 
larger than 1. To reduce this difference, we propose a novel method which 
iteratively estimates the (X^i X v +p — m) and d v . 

1. Initially set d Vi o =pr v . 

2. For the Ith iteration, set A t)j / = p~~ l {d v i_{) for d v ^_\ > (1 + -y/7) 2 , and 

A^; = 1 for d v j-\ < (1 + ^/7) 2 - Define fc; as the number of \ v /s that are 
greater than 1, and let 

dv,l= (^^^v,i+P-kijr v . 
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3 - If 12v=i ^v,l + P~k converges, let 

d v — dy,l 

and stop. Otherwise, go to step 2. 

The consistency of d v to p(X v ) is shown in the following theorem. 

Theorem 3. Let d v be the rescaled sample eigenvalue from the proposed 
algorithm. Then, for X v > 1 + y^y with multiplicity one, 

d v A- p(X v ). 

Since p~ l (d v ) A X v , 4>(p~ x (dv)) 2 is a consistent estimator of 4>(X V ) 2 . Com- 
bining this fact with Theorems 1 and 2, we can obtain the bias-adjusted PC 
score q* 

p -\d v ) + 1 -\ 

P L (d v ) ~ 1 

and the asymptotic correlation coefficient between g v and 
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3. Simulation. First, we applied our bias-adjustment process to the sim- 
ulated data described in the Introduction. Our estimated asymptotic shrink- 
age factors are 0.465 and 0.329 for the first and second PC scores, respec- 
tively. The scatter plot of the top two bias-adjusted PC scores is given in 
Figure 2. After the bias adjustment, the predicted PC scores of the test data 
are comparable to those of the training data. This indicates that our method 
is effective in correcting for the shrinkage bias. 

Next, we conducted a new simulation to check the accuracy of our esti- 
mators. For the jth sample (j = 1, . . . ,n), its ith variable was generated as 



X\Zij, 


i 


= 1 


X2Z1J, 


i 


= 2 


Zij, 


i 


>2 



where Ai > A2 > 1 and z^j ~ iV(0,2 2 ). Under this setting, Ai and A2 are the 
first and the second population eigenvalues. The first and second population 
eigenvectors are e± = {1, 0, . . . , 0} and e2 = {0, 1, 0, . . . , 0}, respectively. We 
set the standard deviation of Zij to 2 instead of 1, which allows us to test 
whether the rescaling procedure works properly. We tried different values of 
7 and n, but set Ai and A2 to 4(1 + ^7) and 2(1 + ^7), respectively. 

We split the simulated samples into test and training sets, each with n 
samples. We first estimated the asymptotic shrinkage factor based on the 
training samples. We then calculated the predicted PC scores on the test 
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Fig. 2. Shrinkage-adjusted PC scores of the data in Figure 1. Different symbols represent 
different groups. White background color represents the training set and grey background 
color represents the test set. (a) Plots of all simulation samples, (b) Center of each group. 



samples. To assess the accuracy of shrinkage factor estimator for each PC, 
we empirically estimated the shrinkage factor by the ratio of the mean pre- 
dicted PC scores of the test samples to the mean PC scores of the training 
samples. That is, for the vih. PC, the empirical shrinkage factor is esti- 
mated by ^/Sr=i 9^/Sfc=iP^fc- On the training samples, we also estimated 
the empirical angle between the sample and (known) population eigenvec- 
tors, as well as the empirical angle between PC scores from sample and 
population eigenvectors. The asymptotic theoretical estimates were also cal- 
culated. Tables 1 and 2 summarize the simulation results. Our asymptotic 
estimators provide accurate estimates for the angles and the shrinkage fac- 
tor. 

Finally, we conducted simulation to demonstrate an application of the 
bias-adjusted PC scores in PC regression. PC regression has been widely 
used in microarray gene-expression studies [9]. In this simulation, we let 
p = 5000, and our set up is very similar to the first simulation of Bair et 
al. [8]. Let Xy denote the gene expression level of the ith gene for the jth 
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Table 1 

Cosine angle estimates of eigenvectors and PC scores based on 1000 simulations. 
"Angle" indicates the theoretical asymptotic cos(angle), "Estimatel" indicates the 
empirical cos(angle) estimator, "Estimate2" indicates the asymptotic cos(angle) 
estimator. For each estimator, each entry represents mean of 1000 simulation 
results with standard error in parentheses 



PC 1 PC 2 

Angle Angle Angle Angle 

7 n Angle Estimatel Estimate2 Angle Estimatel Estimate2 



Eigenvectors 



1 


100 


0.93 


0.93 


(0.013) 


0.91 


(0.027) 


0.82 


0.81 


(0.053) 


0.80 


(0.052) 




200 




0.93 


(0.009) 


0.92 


(0.014) 




0.81 


(0.030) 


0.81 


(0.032) 


20 


100 


0.70 


0.69 


(0.037) 


0.70 


(0.031) 


0.51 


0.50 


(0.053) 


0.50 


(0.058) 




200 




0.69 


(0.023) 


0.70 


(0.022) 




0.51 


(0.036) 


0.51 


(0.041) 


100 


100 


0.53 


0.53 


(0.034) 


0.53 


(0.031) 


0.37 


0.35 


(0.043) 


0.35 


(0.047) 




200 




0.53 


(0.024) 


0.53 


(0.024) 




0.36 


(0.029) 


0.36 


(0.033) 


500 


100 


0.38 


0.38 


(0.029) 


0.38 


(0.028) 


0.25 


0.24 


(0.033) 


0.24 


(0.037) 




200 




0.38 


(0.020) 


0.38 


(0.020) 




0.25 


(0.021) 


0.25 


(0.024) 


PC 


scores 






















1 


100 


0.99 


0.99 


(0.004) 


0.98 


(0.016) 


0.94 


0.93 


(0.036) 


0.91 


(0.048) 




200 




0.99 


(0.003) 


0.99 


(0.006) 




0.94 


(0.019) 


0.93 


(0.024) 


20 


100 


0.98 


0.97 


(0.083) 


0.98 


(0.008) 


0.89 


0.86 


(0.105) 


0.87 


(0.055) 




200 




0.97 


(0.055) 


0.98 


(0.005) 




0.88 


(0.073) 


0.88 


(0.036) 


100 


100 


0.97 


0.97 


(0.079) 


0.97 


(0.009) 


0.88 


0.85 


(0.109) 


0.86 


(0.060) 




200 




0.97 


(0.058) 


0.97 


(0.006) 




0.86 


(0.076) 


0.87 


(0.039) 


500 


100 


0.97 


0.96 


(0.084) 


0.97 


(0.010) 


0.87 


0.83 


(0.117) 


0.84 


(0.069) 




200 




0.96 


(0.058) 


0.97 


(0.007) 




0.86 


(0.076) 


0.86 


(0.038) 



subject. We generated each according to 

(3 + e, i<g,j<n/2, 

%ij = < 4 + e, i<g,j>n/2, 

I 3.5 + e, i > g, 
and the outcome variable yj as 




where n is the number of samples, g is the number of genes that are dif- 
ferentially expressed and associated with the phenotype, e~iV(0, 2 2 ) and 
e y ~ iV(0, 1). A total of eight different combinations of n and g were simu- 
lated. For the training data, we fit the PC regression with the first PC as 
the covariate and computed the mean square error (MSE) . For the test sam- 
ples with the same configuration of the training samples, we applied the PC 
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Table 2 

Shrinkage factor estimates based on 1000 simulation. "Factor" indicates the theoretical 
asymptotic factor, "Estimatel " indicates the empirical shrinkage factor estimator, 
"Estimate2" indicates the asymptotic shrinkage factor estimator. For each estimator, 
each entry represents mean of 1000 simulation results with standard error in parentheses 









PC 1 






PC 2 










Factor 


Factor 




Factor 


Factor 


7 


n 


Factor 


Estimatel 


Estimate2 


Factor 


Estimatel 


Estimate2 


1 


100 


0.88 


0.88 (0.017) 


0.87 (0.076) 


0.75 


0.75 (0.044) 


0.76 (0.063) 




200 




0.88 (0.013) 


0.87 (0.054) 




0.75 (0.027) 


0.75 (0.044) 


20 


100 


0.51 


0.51 (0.037) 


0.51 (0.038) 


0.33 


0.34 (0.033) 


0.32 (0.038) 




200 




0.51 (0.025) 


0.51 (0.026) 




0.34 (0.022) 


0.33 (0.028) 


100 


100 


0.30 


0.30 (0.024) 


0.30 (0.030) 


0.17 


0.17 (0.019) 


0.17 (0.023) 




200 




0.30 (0.017) 


0.30 (0.023) 




0.18 (0.013) 


0.17 (0.017) 


500 


100 


0.16 


0.15 (0.014) 


0.16 (0.020) 


0.08 


0.08 (0.010) 


0.08 (0.013) 




200 




0.15 (0.010) 


0.16 (0.014) 




0.08 (0.007) 


0.08 (0.009) 



model built on the training data to predict the phenotypes using the unad- 
justed and adjusted PC scores. The results are presented in Table 3. We see 
that the MSE of the test set without bias adjustment is appreciably higher 
than that of the test set with bias adjustment, and the MSE of the test set 
with bias adjustment is comparable with the MSE of the training set. 

4. Real data example. Here, we demonstrate that the shrinkage phe- 
nomenon appears in real data, and can be adjusted by our method. For this 
purpose, genetic data on samples from unrelated individuals in the Phase 3 
HapMap study (http : / /hapmap . ncbi . nlm . nih . gov/) were used. HapMap 

Table 3 

Mean Sguare Error (MSE) of the PC regression based on gene-expression microarray 
data simulation with and without shrinkage adjustment. 1000 simulation were conducted. 
Each entry in the table represents mean of the MSE with standard error in parentheses 



n 


9 


Test data 
without adjustment 


Test data 
with adjustment 


Training data 


100 


150 


1.97 (0.256) 


1.70 (0.284) 


1.61 (0.284) 


100 


300 


1.63 (0.230) 


1.17 (0.167) 


1.12 (0.158) 


100 


500 


1.43 (0.204) 


1.07 (0.157) 


1.03 (0.147) 


100 


1000 


1.22 (0.182) 


1.03 (0.148) 


0.99 (0.142) 


200 


150 


1.73 (0.159) 


1.33 (0.133) 


1.30 (0.131) 


200 


300 


1.39 (0.139) 


1.08 (0.105) 


1.07 (0.110) 


200 


500 


1.24 (0.131) 


1.04 (0.105) 


1.01 (0.101) 


200 


1000 


1.10 (0.114) 


1.02 (0.101) 


1.00 (0.101) 
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is a dense genotyping study designed to elucidate population genetic differ- 
ences. The genetic data are discrete, assuming the values 0, 1 or 2 at each 
genomic marker (also known as SNPs) for each individual. Data from CEU 
individuals (of northern and western European ancestry) were compared 
with data from TSI individuals (Toscani individuals from Italy, representing 
southern European ancestry). 

Some initial data trimming steps are standard in genetic analysis. We 
first removed apparently related samples, and removed genomic markers 
with more than a 10% missing rate, and those with frequency less than 0.01 
for the minor genetic allele. To avoid spurious PC results, we further pruned 
out SNPs that are in high linkage disequlibrium (LD) [11]. Lastly, we ex- 
cluded 7 samples with PC scores greater than 6 standard deviations away 
from the mean of at least one of the top significant PCs [i.e., with Tracy- 
Widom (TW) Test p- value <0.01] [24, 27]. The final dataset contained 178 
samples (101 CEU, 77 TSI) and 100,183 markers. We mean-centered and 
variance-standardized the genotypes for each marker [27]. The screeplot of 
the sample eigenvalues is presented in Figure 3. The first eigenvalue is sub- 
stantially larger than the rest of the eigenvalues, although the TW test 
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Fig. 3. Scree plot of the first 30 sample eigenvalues, CEU + TSI dataset. 
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Fig. 4. An instance with and without shrinkage adjustment, performed on Hapmap 
CEU(*) and TSI(+). "*" and "+" represent PC scores using all data. The 161st sample 
was excluded from PCA, and PC score for it was predicted. The grey rectangle represents 
the predicted PC score without shrinkage adjustment and the grey circle represents the 
predicted PC score after the shrinkage adjustment. 



actually identifies two significant PCs. Figure 3 suggests that our data ap- 
proximately satisfies the spiked eigenvalue assumption. 

We estimated the asymptotic shrinkage factor and compared it with the 
following jackknife-based shrinkage factor estimate. For the first PC, we first 
computed the scores of all samples. Next, we removed one sample at a time 
and computed the (unadjusted) predicted PC score. We then calculated 
the jackknife estimate as the square root of the ratio of the means of the 
sample PC score and the predicted PC score. The jackknife shrinkage factor 
estimate is 0.319, which is close to our asymptotic estimate 0.325. Figure 4 
shows the PC scores from the whole sample, the predicted PC score of an 
illustrative excluded sample, and its bias-adjusted predicted score. Clearly, 
the predicted PC score without adjustment is very biased toward zero, while 
the bias-adjusted PC score is not. 

5. Discussion and conclusions. In this paper, we have identified and ex- 
plored the shrinkage phenomenon of the predicted PC scores, and have de- 
veloped a novel method to adjust these quantities. We also have constructed 



CONVERGENCE AND PREDICTION OF PC SCORES 



15 



the asymptotic estimator of correlation coefficient between PC scores from 
population eigenvectors and sample eigenvectors. In simulation experiments 
and real data analysis, we have demonstrated the accuracy of our estimates, 
and the capability to increase prediction accuracy in PC regression by adopt- 
ing shrinkage bias adjustment. For achieving these, we consider asymptotics 
in the large p, large n framework, under the spiked population model. 

We believe that this asymptotic regime applies well to many high-dimen- 
sional datasets. It is not, however, the only model paradigm applied to such 
data. For example, the large p small n paradigm [1, 14], which assumes 
p/n — > oo, has also been explored. Under this assumption, Jung and Mar- 
ron [20] have shown that the consistency and the strong inconsistency of 
the sample eigenvectors to population eigenvectors depend on whether p in- 
creases at a slower or faster rate than . It may be argued that for real data 
where p/n is "large," we should follow the paradigm of Hall, Marron and 
Neeman [14], Ahn et al. [1]. However, for any real study, it is unclear how 
to test whether p increases at a faster rate than A^, or vice versa, making 
the application of Hall, Marron and Neeman [14], Ahn et al. [1] difficult in 
practice. Furthermore, the scenario where p and A„ grow at the same rate is 
scientifically more interesting, for which we are aware of no theoretical re- 
sults. In contrast, our asymptotic results can be straightforwardly applied. 
Further, our simulation results indicate that for p/n as large as 500, our 
asymptotic results still hold well. We believe that the approach we describe 
here applies to many datasets. 

Although the results from the spiked model are useful, it is likely that ob- 
served data has more structure than allowed by the model. Recently, several 
methods have been suggested to estimate population eigenvalues under more 
general scenarios [10, 29]. However, no analogous results are available for the 
eigenvectors. In data analysis, jackknife estimators, as demonstrated in the 
real data analysis section, can be used. However, resampling approaches are 
very computationally intensive, and it remains of interest to establish the 
asymptotic behavior of eigenvectors in a variety of situations. 

We note that inconsistency of the sample eigenvectors does not necessar- 
ily imply poor performance of PCA. For example, PCA has been success- 
fully applied in genome-wide association studies for accurate estimation of 
ethnicity [27], and in PC regression for microarrays [21]. However, for any 
individual study we cannot rule out the possibility of poor performance of 
the PC analysis. Our asymptotic result on the correlation coefficient between 
PC scores from sample and population eigenvectors provides us a measure 
to quantify the performance of PC analysis. 

For the CEU/TSI data, SNP pruning was applied to adjust for strong 
LD among adjacent SNPs. Such SNP pruning is a common practice in the 
analysis of GWAS data, and has been implemented in the popular GWAS 
analysis software Plink [28]. The primary goal of SNP pruning is to avoid 
spurious PC results unrelated to population substructures. Technically, our 
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approach does not rely on any independence assumption of the SNPs. How- 
ever, strong local correlation may affect eigenvalues considerably. Thus, the 
value in SNP pruning may be viewed as helping the data better accord with 
the assumptions of the spiked population model. From the CEU/TSI data 
and our experience in other GWAS data, we have found that the most com- 
mon pruning procedure implemented in Plink is sufficient for us to then 
apply our methods. 

6. Proofs. Note that EA 1 / 2 ZZ T A 1 / 2 E T and A 1 / 2 ZZ T A 1 / 2 have the same 
eigenvalues, and E r U is the eigenvector matrix of A 1 ^ 2 ZZ r A 1//2 . Since 
eigenvalues and angles between sample and population eigenvectors are what 
we concerned about, without loss of generality (WLOG), in the sequel, we 
assume A to be the population covariance matrix. 

6.1. Notation. We largely follow notation in Paul [26]. We denote A„(S) 
as the fth largest eigenvalue of S. Let suffice A represent the first m coor- 
dinates and B represent the remaining coordinates. Then we can partition 
S into 

Saa ^ab 
Sba &bb 

We similarly partition the vth eigenvector into (u^ it ,UB„) and Z T into 
[Z^,Zjg]. Define R v as ||u.B,t>|| and let = u^/yl — i? 2 , then we get 
||a„|| = 1. 

Applying singular value decomposition (SVD) to Z^/y^, we get 
(9) -iz B = VM 1/2 H T , 

where M = diag(//i, . . . , fJ, p -m) is a (p — m) x (p — m) diagonal matrix of 
ordered eigenvalues of Sbb, V is a (p — m) x (p — m) orthogonal matrix 
and H is an n x (p — m) matrix. For n > p — m, H has full rank orthogonal 
columns. When n < p — m, H has more columns than rows, hence it does not 
have full rank orthogonal columns. For the later case, we make H = [H n ,0] 
where H n is an n x n orthogonal matrix. 

6.2. Propositions. We introduce two propositions for later use. The proofs 
of the two propositions can be found in Sections 6.5 and 6.6. 

Proposition 1. Suppose Y is an n x m matrix with fixed m and each 
entry of Y is i.i.d. random variable which satisfies the moment condition 
of Zij in Assumption 1. Let C be an n x n symmetric nonnegative definite 
random matrix and independent o/Y. Further, assume ||C|| =0(1). Then 

— Y T CY — — trace(C)I A 

n n 

as n — > oo . 
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Proposition 2. Suppose y is an n- dimensional random vector which 
follows the same distribution of the row vectors of Y and independent of 
Sbb- Let f(x) be a bounded continuous function on [(1 — - v /7) 2 , (1 + -v/7) 2 ] 
and f(0) = 0. Suppose F = diag(/(^i), . . . , /(/x p _ m )) ; where {m}?!™ are or- 
dered eigenvalues o/M which is defined on (9), then 

-y T HFH T y- 7 / f(x) dF^x) 4 
n J 

as n— >oo, where Fry(x) is a distribution function of Marcenko-Pastur law 
with parameter 7 [22]. 

6.3. Proof of part (i) of Lemma 1. 

6.3.1. When p is fixed. By the strong law of large numbers, S ^4' A. 
Since eigenvalues are continuous with respect to the operator norm, the 
lemma follows after applying continuous mapping theorem. 

6.3.2. When p — > 00. For every small e > 0, there exist p{n) and -y e 
such that p{n)/n — > 7 e > 0, A„(l + 7 E /(A« — 1)) < A^ + e for all v < m, 
(1 + ^J 7 %) 2 < 1 + s and (1 — yf%) 2 > 1 — e. For simplicity, we denote p{n) 
as p. Suppose Zp is a p x n matrix that satisfies the moment condition of 
Zij in Assumption 1. Define an augmented data matrix X T = [Z T A,Zj] T 

and its sample covariance matrix S = XX T . Let S be a p x p upper left 
submatrix of S. We also let S be an (m + 1) x (m + 1) upper left submatrix 
of S. For v < (m + 1), by the interlacing inequality (Theorem 4.3.15 of Horn 
and Johnson [15]), 

A„(S)<A„(S)<A„(S). 

Since A„(S) ^4' X v , A„(S) ^4' A„(l + J E /(X V — 1)) < 1 + e for v < m, and 
A„(S) ^4' (1 + a/t^) 2 < 1 + e for u = m + 1, we have 

A„ - o(l) < A„(S) < A„ + e + o(l) for v<m + l. 

Thus, 

(10) A„(S)^4'A„ faru<m + l. 
Similarly by the interlacing inequality, we get 

Xp(S)<X p (S)<\ m+1 (S). 
Since A m+ i(5) ^4' 1 and Xp(S) ^4' (1 — ^/^) 2 > 1 — e, we conclude that 

(11) A p (S) a 4i. 

Part (i) of Lemma 1 follows by (10) and (11). 
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6.4. Proof of part (ii) of Lemma 2. Our proof of Lemma 2(ii) closely 
follows the arguments in Paul [26]. From [26], it can be shown that 

(12) (s AA + ^A A /2 Z A UM(d v I - M)- 1 H t z5aY 2 ^ a,, = d v Si v 
and 

(13) ^ (i + iAy 2 Z A HM(cU - Mr 2 H T Z^AY 2 ) a, = , 
where A A = diag{Ai, . . . , A m }. 

6.4.1. When X v > 1 + ^7. We can show that 



(14) 
and 



(a v ,e AtV ) -)■ 1 



(15) -z^HM(d„I - M)^H r z^ A { 



1 
10, 



(p v - x) 2 



dF^(x), 



for 7 > 0, 
for 7 = 0, 



where e Av is a vector of the first m coordinates of the v th population eigen- 
vector e v , p v is X v (l + A 1 _ 1 ) and z Av is a vector of vth row of Z A . The 
proofs can be found in Section 6.4.3. Note that e„ is a vector with 1 in its 
vth. coordinate and elsewhere. WLOG, we assume that (e v ,u v ) > 0. Since 
(e v ,u v ) = y/1 - B%(eA >v ,av), {e v ,u v ) A y/l - R 2 . By (13) and (15), we can 
show that 



(16) 



1 P I 1 + A t ,7 



l-Rl 



(Pv - x) [ 



1. 



for 7 > 0, 
for 7 = 0. 



From Lemma B.2 of [26], 
(17) 
Thus, 



(Pv - x)' 



■dF y (x) 



(A,-l) 2 -7" 



(18) ^/l^R 2 ^ { 



1 



1 + 



1 



Xv-l 



U 



for 7 > 0, 
for 7 = 0. 



It concludes the proof of the first part of Lemma 2 (ii) - 



6.4.2. When 1 < A„ < 1 + ^7. Here, we only need to consider 7 > 
because no eigenvalue satisfies this condition when 7 = 0. We first show 
that A 1, which implies -A 0, hence (e v ,u v ) A 0. For any e > and 
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x > 0, define 
and 



x, if x > e, 
e, if x < s, 



G £ = diag(d„/(((4 - pi) ) £ , . . . ,d v /((d v - n P - m ) ) £ ), 
then by Propositions 1 and 2, 

(19) ±z T Av HG £ H T z Av 4 7 J {{pv ^ )2)£ dF y (x). 
By monotone convergence theorem, 

(20) "Iw^ dF ^^I(^ iF ^- 

The right-hand side of (20) is 

6 y /{b-x){x- a) 
2ir(p v — x 



(21) / V ;. / , J dx, 



where a = (1 — - v /t") 2 and b = (1 + ^/t") 2 . Since (21) equals oo for any a < 
Pv < b, we conclude that 

(22) -z5„HM(4I - M)" 2 H T z A , ; A oo. 



Therefore i? t , A 1, which proves the second part of Lemma 2(h). 
6.4.3. Proof of (U) and (15). Define 

m . 

T7 \ ~* T 

Uv = 1^ TJ\, V^ eA ' ke Ak> 

T^v = Saa + Sab(^I - S B b) _1 S B a - (Pv/K)A-a, 
a v = WR-vDvW + \d v — p v \\\TZ v \\ and f3 v = \\1Z v V v eA, v \\- 
With the exactly same argument of [26], it can be shown that 

where r„ = -(1 - (e^a^e^i, -TZ v V v (a v — e^ )V ) + (d v — p v )TZ v (a v -eA, v )- 
By Lemma 1 of [25], r v = o p (l), if a v = o p (l) and j3 v = o p (l). 

When 7 = 0, Saa — (Pvf^v)-^A and the remainder of V v is 

(23) S AB (d v I - 3 B b)- x Sba = -A A /2 Z A HM(d v I - M^V Z^A 1 / 2 . 

n 

Since d v ^4' and pi ^4* 1 , 

||HM(4I - M) _1 H r || ^ 1/(A„ - 1). 
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By Proposition 1, 
(24) < 



<Ai 



PfJ-i 



n(d v - m) 

hence V v = o p (l). 

When 7 > 0, V v can be written as 

V v = [S AA - A A ) 
1 



+ o p (l) = o p (l) 



+ 



1/2 



■ZaHM(p„I - M)~ 1 H T Z j4 



n 



(25) 



- trace(M(/o„I - M) _1 )I ) A 1/2 
n 



+ 



+ 



-trace(M(^I-M) _1 ) -7 / — - — dFJx) )A A 
n J Pv-x ) 

( Pv - 4)-aY 2 z a hm(pJ - My\d v i - m)- 1 h t z a aY 2 



The first term of the right-hand side is o p (l) by the weak law of large number. 
The second and third terms are o p (l) by Propositions 1 and 2. For the fourth 
term, p v — d v = o p (l) and its remainder part is O p (l). Therefore, V v = o p (l). 
By combining the above results and TZ V = O p (l) plus d v — p v = o p (l), we 
prove (14). 

For (15): When 7 = 0, (15) can be proved by the exactly same way used 
to show (24). When 7 > 0, d v p v , and p\ ^4-' (1 + ^t") 2 < p v , hence ||C|| ^4' 
(p -(i^jj)iyi ■ Therefore, the result follows according to Propositions 1 and 2. 

6.5. Proof of Proposition 1. Let p,± > p>i > • • • > p, n be the ordered eigen- 
values of C, and be the (i, j)th element of C. Suppose y s is the sth col- 
umn of Y, and j/y is the (i,j)th element of Y. We further define ip(s,s) = 
^yj Cy s — - trace(C) and i/j(s, t) = ^yJCyt for s^t. The conditional mean 
of ip(s,s) given C is 

(\ \ 1 n 

E{ip{s,s)\C) = El-^2cijyi S y js \CJ --^(M 

^ n i,j ' n i=i 

j n 2 n 1 U 

(26) = - Yl caEiVis) + ~Y1 CijE(y is y js ) - - ^ Mi 



i=l 



j n 1 n 

- V C« - - V 

n n 



Mi 



= 0. 



i=l 



Thus, E(ip(s, s)) = E(E(tp(s, s)|C)) = £(0) = 0. 
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Next, the conditional variance of tp(s,s) given C is 

Var(^(s,s)|C) = — Var ( ^ Cjjy is yj S | C 



n 



~2 ^2 c ij c ig Cov (yisyjs,yisy q s) 



(27) 

4 n 4 n 

ij'=l j,j=l 

= — trace(C = — > fi { , 
n z n z — ' 

i=l 

where a = max(l,E(yf s ) - 1). Since ||C|| = 0(1), $ < ll c l| 2 = 0(1). There- 
fore, Var(^(s,s)|C) < 0(l/n) and Var(^(s, s)) = Var(£(^(s, s)|C)) + 
i?(Var(^(s, s)|C)) < + O(l/?i) — >• as n— >oo. By the Chebyshev inequal- 
ity, we can conclude that 

^( s , s )4o. 

We can similarly show ip(s,t) A 0, which we omit here. 
6.6. Proof of Proposition 2. Consider an expansion 

iy T HFH T y- 7 f f{x)dF 1 {x) 
n J 

-y T UFU T y - - trace(F) 
n n 



+ 



-!- trace(F) - 7 J f(x)dF 1 {x) 



= (a) + (b). 

We show that both (a) and (b) converge to in probability. 

(a) : Since /ii ^ (1 + y^) 2 , ^ min ( p - m , n ) ^4' (1 - y^) 2 , = for k > 
mm(p — m,n) and f(x) is continuous and bounded on [(1 — ^/7) 2 , (l + \/7) 2 ]) 
there exists A' > such that sup i |/(/Xj)| < K a.s. Let C = HFH T , then 
trace(C) =trace(F). By Proposition 1, (a) = o p (l). 

(b) : Let F p — m be an empirical spectral distribution of Sbb^ then 



-trace(F) = ^ / f(x)dF p _ m (x) 

n n J 
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and / f(x)dF n (x) 4 f f(x) dF y (x) [5, 22]. Thus, 

p — m 
n 



f{x)dF p _ m {x)^ 1 / f{x)dF 1 {x) 



which shows that (b) = o p (l). 

Combining (a) and (b), we finish the proof. 

6.7. Proof of Theorem 1 . Without loss of generality, we assume (g„, p v ) > 
0. Let e v = {e^ v ,eB, v } , then &a,v is the vector with 1 in uth coordinate and 
elsewhere, and es, v is the zero vector. Since Saa^-A,v + Sab^-b,v = d v UA, v , 
we have 

(Sv,Pv) = — ^=^=^XX T u v 



ny/d v X v v 



(28) 



^a^Saa^a,v/ V d v X v + e^ v SAB^B,v/ V d v \ v 



d V f 



\J d v X v ' V X v 



(A,-1)V' 



for X v > 1 + ^/7, 
for 1< A„ < 1 + yfy. 



6.8. Proof of Theorem 2. First, we show the square of the denominator 

2 i 
vj' 



converges to p(X v ). Since p v j = u„ Xj, and E(pl i ) = S(pL) for i ^ j 



^) = ^(e^)=>(E(^) 2 ) 

(29) 

= E{ulxX T vL v /n) = E(d v ) a 4' p(X v ). 

Next, we show the square of numerator converges to (p(X v ) 2 (X v — 1) + 1. 

Define u„ := , ==(/ — e v ez)u v , then u v can be expressed as 
Vi-(u? e^) 2 



u„ = (u^e„)e„ + y 1 - (u^e„) 2 u^-. 

Partition u^J" = {irj[ w , w }. From (14), A e^,, therefore w A and 
u Bi) U iJi> 1- Since x new and u„ are independent, we have 

E(q%\u v ) = E'((u^x new ) 2 |u 1 ,) = u^£ , (x new Xn ew |u, ; )u^ = ujAu„ 
= (u^e^e^Ae,, + (1 - (u^e„) 2 )u^ T Au^ 



+ 2u^e v J 1 - (u^e v ) 2 e^Aui 
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(30) 

= «e„) 2 A„ + (1 - (u^e^) 2 )(u^A A u^„ + ugu^ ) 



+ 2u„e v J 1 - (u^e^e^A^u^ 



4«/»(A 1) ) 2 (A 1) -l) + l. 
From (29) and (30), 



(31) 



/ E{ql) 4>{\ v y{\ v -l) + l (X v - 1) 

E( P 2 J ^ P(K) (A„ + 7 -l)' 



6.9. Proof of Theorem 3. Since p 1 {pr v ) — > X v for v < k, WLOG we as- 
sume that ko = k, where k is the number of A„ bigger than 1 + y^y. Set 

k 

(32) ft,(x) = p~ 1 (r v x) + p — k — x. 

v=l 

The first and second partial derivatives of h(x) are 

(33) g/t(x) _ 1 ly> (xr„ - (l + 7))r„ _ ] 

^ ~ 2 t[ V 2 ^V / (^-(l + 7)) 2 -4 7 ' 

(34) ^M = 2 V =^ < 

1 ' 8x 2 ^ ((^_(l + 7 ))2_4 7 )3/2 ^ ' 

so fo(x) is a concave function of x given r v . From the fact that /9 _1 (r\,p) > 1 
for v < k, we know h(p) > 0. Because of the concave nature of this function, 
h{x) = has a unique solution r on [p, oo), which X^=i ~~ m i con " 

verges to. Thus, c4 = rr„. Define d v = r v uj where ui = Ylv=i +P — k, and 
set d v as the sample eigenvalue when a 2 = 1. The sum of all d v is 

p p n 

(35) ^^ = -trace(ZZ T A) = -^^Ai4, 

v=l i=l j=l 

thus 
(36) 
and 

(37) Var ( dv ) = - A " - 1) -> 0. 

\ w J n ca z 

By (36) and (37), 

(38) J]d„/w = l + o p (l). 



\ cj / CO 
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Since d v — > p(X v ) for v < k, 

p 

(39) d v = d v u l^^d v = d v (l + o p (l)) -4 p(X v ). 

v=l 

Now, we show that r = ui + o p (l). Plugging u into /i(x) and combining the 
fact that p~ 1 {d v ) = X v + o p (l), we get 

k k 

(40) h(u J ) = Y,P~ 1 (d v )-J2^ = o p (l). 

v=l v=l 

From the facts that h(x) is a continuous concave function, ui > p, and h(p) > 
0, we conclude that 

(41) LJ = T + O p (l). 

Therefore, 

(42) (2 t . = r„r = r v (u + o p (l)) = d„ + o p (l) A p(A.„) 
for v < k, which concludes the proof. 
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